home *** CD-ROM | disk | FTP | other *** search
- Multiprocessing White Paper
- Introduction
- Today's users of computer systems have been demanding more and more
- power. To accommodate this need, computer vendors have been pushing the
- technology limits of hardware and have turned to RISC systems due to
- their superior performance. But some application environments require
- still more power than is available from single processor systems.
- Multiprocessor systems have thus been developed to provide a further
- performance boost.
- The idea behind multiprocessing is simple. If one processor does not
- provide enough performance, then add additional processors.
- Multiprocessing (MP) systems let users reach high performance levels
- without waiting for next generation CPUs, while at the same time
- leveraging existing hardware. The challenge of multiprocessing (MP)
- systems is ensuring applications are able to take advantage of the full
- performance available to them from additional processors.
- MP systems have been available for a number of years. IBM mainframes
- since the late 1960s have used multiprocessing to provide a range of
- different performance levels. HP also implemented multiprocessing in the
- early 1980s with its line of UNIX workstations. Just as minicomputers
- are moving into mainframe levels of performance, multiprocessing is
- moving more and more from proprietary environments into the open
- environment of UNIX. Figure 1 shows how HP's rapid performance growth of
- PA-RISC systems has been enhanced with the extra performance levels
- available with MP. The availability of UNIX MP systems is an exciting
- development because it means that users and businesses will not just
- have the advantage of using open systems, but will also have the power
- needed for the most demanding environments.
-
- [Figure 1: Illustration: Series 800 Performance Growth]
-
- Definition of Multiprocessing
- Multiprocessing is a design where many CPUs are connected to provide
- additional computing power to allow many tasks to run in parallel. This
- is more than multitasking. Multitasking systems allow many jobs or
- processes to be running on a system, rather than running each job
- serially to completion. However, that does not mean that at any one
- moment that all jobs are running. For example, figure 2 shows how a
- single CPU system parses out CPU time to multiple jobs and switches
- between running jobs. If the CPU switches between jobs quickly, it gives
- a user the appearance that many jobs are indeed running simultaneously,
- even though in actuality only one job is running at a time. For
- multiprocessing systems, figure 2 shows that each CPU continues to
- switch between running jobs, but now jobs are truly running in parallel
- with jobs running on each CPU (of course each individual CPU continues
- to multitask).
-
- [Figure 2: Illustration: Job Scheduling]
-
- Multiprocessing also means more than having a multiuser or timeshare
- system. As with multitasking, a system with a single CPU can switch
- between jobs for different users and, if it has the performance to
- switch quickly enough, gives each user the appearance that they have a
- dedicated computer at the other end of their display. A multiprocessor
- system essentially provides more CPU resources that can thus be shared
- simultaneously by many users without a drop in response time.
- Another common misconception with MP systems is that they are Fault
- Tolerant systems. While Fault Tolerant systems do have multiple CPUs as
- figure 3 shows for the HP 9000 Series 1200 systems, they also have fully
- redundant hardware for system buses, memory, I/O and peripherals and
- power supply. This redundancy protects Fault Tolerant systems from
- unplanned downtime due to any one hardware component going down.
- However, MP systems can take advantage of the additional CPUs to ensure
- continued operation if one CPU fails. For example, if a CPU failure in
- the HP 9000 Series 890S MP Corporate Business Server system causes the
- system to go down, the HP 890S will bring itself down and then boot up
- again with the faulty CPU reconfigured out of the system. The HP 890S
- will continue to run at a reduced performance level until the CPU board
- can be repaired or replaced.
-
- [Figure 3: Illustration: HP 9000 Series 1200 Fault Tolerant System]
-
- Building an MP system requires more than just adding additional CPU
- boards to a system. In actuality, there is a continuum of
- multiprocessing systems from loosely coupled machines on a network to
- multiple CPUs in one system acting as if there was only one CPU in the
- machine. The key measure of MP systems is how transparent the
- implementation is to users and programmers and also how well the
- additional power of the extra CPUs is utilized.
-
- Types of Multiprocessing Systems
- Several terms have been suggested to help describe different types of
- computer systems and provide a useful framework for understanding
- multiprocessing systems in relation to other types of systems. Computers
- can be grouped into 3 classes:
- SISD (single instruction, single data stream). A SISD machine is a
- typical single CPU system where one CPU executes one instruction and
- works on one piece of data at any one time.
- SIMD (single instruction, multiple data stream). One example of a
- SIMD machine is a vector processor, where an array of data is input into
- several registers and then the same operation (eg. multiply by pi) is
- performed on all the data simultaneously. SIMD machines are mostly used
- for scientific and engineering applications where intense numeric
- computations and simulations are needed.
- MIMD (multiple instructions, multiple data stream). This is where
- many CPUs execute different instructions on different pieces of data.
- However, MIMD systems differ on how instructions and data are shared
- between each CPU and how much each CPU interacts with the other CPUs.
- Multiprocessing computer systems are MIMD machines with the CPUs in
- one system sharing all the system resources such as memory, I/O and
- buses. There are two main types of multiprocessing systems in use today,
- asymmetric and symmetric.
- Figure 4 shows an asymmetric multiprocessing system found in several
- graphic workstations. Even though there are two processors in the
- system, they are not equal. The main processor runs the operating system
- and user programs, accesses the main memory, and also controls the
- actions of the attached graphics processor. The attached graphics
- processor handles the displaying and updating of the graphical display,
- but does not actually modify any of the data in the main memory nor does
- it execute any user programs. The attached processor does improve the
- performance of the overall system, because the main processor does not
- need to spend its resources on the graphical display, but it does not
- provide a user with the full power of the two processors.
-
- [Figure 4: Illustration: Graphics Workstation]
-
- Asymmetric multiprocessing systems are relatively easy to implement
- because less modifications need to be done to an operating system,
- memory bus structures and I/O. However, asymmetric systems have a
- potential performance bottleneck with the main processor. Figure 5 shows
- an expanded asymmetric system where there is also an attached vector
- processor and floating point processor, in addition to a graphics
- processor. The three attached processors are controlled by the main
- processor and can thus sit idle if the main processor becomes busy.
- Asymmetric processors are best suited for special tasks such as heavy
- numeric calculations or simulations where the system can be tailored for
- the specific task at hand.
-
- [Figure 5: Illustration: Graphics Workstation]
-
- In symmetric multiprocessing, all processors are equal and are not
- specialized for specific tasks. Figure 6 shows the HP 890S symmetric
- multiprocessing system where each processor can access memory, I/O and
- other parts of the system. The operating system and hardware components
- need to maintain data consistency, job scheduling and message passing
- between processors. There are two main implementations of symmetric
- multiprocessing: master/slave and fully symmetric. While the hardware
- may be symmetric, the differences between these two implementations are
- mainly in how the operating system handles two main tasks: job
- scheduling and kernel access.
-
- [Figure 6: Illustration: HP 890S Multiporcessing System]
-
- The job scheduling algorithm determines how jobs are parsed out and
- scheduled to each processor and is a major factor in overall
- multiprocessor performance. Figure 7 shows the job scheduling for
- master/slave systems, where one processor (master) assigns jobs to the
- other processors (slaves). In fully symmetric systems, each processor
- receives the next job that is waiting in a global system queue.
-
- [Figure 7: Illustration: Job Scheduling]
-
- To show how the two scheduling routines work, think of waiting for the
- next available cashier at a store. In a master/slave store, one of the
- cashiers has the responsibility of assigning customers to the next
- available cashier. If the master cashier is busy when one of the slave
- cashiers signals that they are ready for another customer, there will be
- a delay before a customer is sent to the slave cashier, slowing down the
- overall throughput. Or this store might modify the routine slightly and
- have the master cashier assign an incoming customer straight away to one
- cashier. However, if for some reason there is a holdup at one of the
- cashiers (because of some pricing difficulties), then all the other
- customers in line for that cashier are stuck, even though other cashier
- lines continue to move quickly. Either way, overall throughput is
- lowered as the master cashier (processor) must spend time assigning and
- possibly reassigning customers (jobs) to the other slave cashiers
- (processors). Also, individual customers (jobs) may find that they may
- have to wait much longer than other customers before they are serviced
- by a cashier (processor), just because they were assigned to a slow line
- to begin with.
- In contrast, in a store that implements a fully symmetric system, no
- one cashier assigns customers to the other cashiers. All incoming
- customers line up in one queue and when they get to the head of the
- line, they go immediately to the next available cashier. The advantage
- is that no extra time nor resources are spent allocating customers
- (jobs) to specific cashiers (processors) and that any queued customer
- (job) is more likely to be serviced in a timely fashion.
- Kernel access, or the lack of access, can also hinder multiprocessor
- performance. As figure 8 shows, some implementations of master/slave
- systems limit the execution of the UNIX kernel (the operating system
- core) and access to I/O to only the master processor, further pinching
- the master processor bottleneck. Examples of master/slave
- multiprocessing can be seen in systems from SUN and Solbourne. These
- type of systems are best suited to environments that have little I/O
- activity or have intensive numerical tasks that can be easily scheduled.
- In a heavy multiuser environment, users of master/slave symmetric
- multiprocessing systems (while gaining some improvement) will fail to
- see the extra performance promised from additional processors as the
- extra processors will spend more and more time waiting for kernel or I/O
- requests to be filled by the master processor.
-
- [Figure 8: Illustration: Kernel and I/O Access]
-
- Fully symmetric multiprocessing systems, such as the HP 890S, are more
- difficult to implement, but are better suited to heavy multiuser or I/O
- intensive environments, such as OLTP and RDBMS applications. Each
- processor is an equal to other processors and so can access I/O or the
- UNIX kernel, instead of sending all such requests to a master processor
- and then waiting for service. In general, SMP systems provide better
- scaling for commercial environments, such as Online Transaction
- Processing (OLTP) applications. Mixed workloads would also work best on
- a SMP system, because they would be most likely to have a high amount of
- I/O and kernel calls.
-
- Application fit with Multiprocessing
- Applications that appear to the UNIX system as just one large process
- would most likely not run faster on a multiprocessing system. Such
- applications are called single threaded and typically can not be split
- up among multiple processors. Large batch jobs, common in commercial
- environments, is one example of a single threaded application. In order
- to be split up over multiple processors, a single threaded application
- needs to be either "broken up" explicitly into multiple threads or
- processes, or implicitly broken up with the help of special parallel
- compilers. However, general purpose parallel compilers are still not
- available for the commercial marketplace. For today, large batch
- applications would run best on SMP systems where each individual
- processor has high performance, rather than a SMP system using many
- lower performance processors. The balance between batch jobs and mixed
- applications (such as OLTP) is one of the reasons that the HP 890S
- multiprocessing system was implemented with high performance
- uniprocessors, rather than with several more lower performance
- uniprocessors. The HP 890S offers the fastest combined batch and OLTP
- system performance due to its fast individual processors.
- RDBMS applications can also be tuned explicitly for MP systems for
- increase OLTP performance. HP has and continues to work closely with key
- industry leading RDBMS solution providers to optimize their applications
- specifically for the HP 890S MP system.
- Multiprocessing Challenges
- A fully symmetric multiprocessing system needs to overcome several
- hurdles without compromising the performance potential of additional
- processors. The main challenges are ensuring data integrity, job
- scheduling, I/O and kernel access, and application transparency. The
- fully symmetric multiprocessing implementation of the HP 890S will be
- used as an example of how these multiprocessing design challenges have
- been meet.
-
- Ensuring Data Integrity
- As seen in figure 6, the HP 890S is a tightly coupled system with all
- processors sharing the same memory. It is very likely that different
- jobs running on different processors will need to access the same memory
- locations, and thus the system must ensure that data integrity is
- maintained at all times with little or no contention. The HP 890S solves
- this problem in hardware with a "snoopy" cache protocol. All processors
- have their own local cache and memory controllers, and manipulate data
- in memory through their caches. Processors can either share cache lines,
- or mark lines as private. Each memory controller listens to every
- transaction on the System Memory Bus (SMB) (or "snoops," leading to the
- snoopy term) and act as needed to maintain data integrity. For example,
- if processor A and B share a cache line, but now processor A needs to
- write to the shared line, A broadcasts the write and marks the cache
- line as private. In the meantime, processor B picks up the write
- request and thus marks its copy of the cache line as invalid, leaving
- the modified line on processor A as the only valid copy. The HP 890S
- also avoids another potential bottleneck due to the extra traffic on the
- SMB. The SMB has a bandwidth of 800 MB/s, ensuring that bus contention
- will not occur between the processors and thus will not decrease
- performance.
-
- Job Scheduling
- A fully symmetric MP system has already been shown in figure 7 to use a
- single job queue for all processors. The next waiting job is sent to the
- next available processor. While a single job queue is an efficient MP
- scheduling method in general because of its inherent dynamic load
- balancing, it can be made more efficient by recognizing that not all
- jobs finish after one run on a processor. Often jobs or processes are
- scheduled to run for some length of time and are then swapped out and
- placed back in the job queue while another job is swapped in. However, a
- process builds up local data structures on the processor that it runs on
- (such as data into cache). If that process is later scheduled to run on
- another processor, then the second processor must spend time rebuilding
- the same data structures that are resident on the first processor. The
- HP 890S removes this potential source of inefficiency by using a
- heuristic scheduling algorithm where each process develops an affinity
- to one particular processor, but can move to another processor if needed
- for load balancing. To understand this heuristic scheduling method,
- imagine a process is handled the same way as a person who constantly
- checks in and out of a hotel. If the hotel kept checking this person
- into the same room, then some of the person's baggage could remain in
- the room when the person wasn't in (assuming of course that everyone
- else has the courtesy to leave it alone), making it very easy and quick
- for this person (and the hotel) to check back in. If it was necessary to
- move to another room, or if this person was not coming back, only then
- would the effort be spent to pack up all this person's belongings and
- move them.
-
- Kernel Access
- In order to maximize performance, the HP 890S allows multiple processes
- to be executed in the UNIX kernel at the same time. Without this
- capability, it is very likely that processors could sit idle while they
- wait for access to the kernel. The HP-UX kernel data structures are
- broken up into several pieces with synchronization variables called
- semaphores used to protect each block of data structures. A processor
- must lock the appropriate semaphore before they can access the desired
- kernel section. If another processor previously locked the semaphore,
- then the requesting processor must wait until the semaphore is unlocked.
- Kernel contention is reduced by having each semaphore lock only a small
- part of the kernel. The kernel semaphores also ensure that no I/O
- collisions occur between different processors. This is in contrast to
- master/slave systems where the UNIX kernel only runs on the master
- processor, thus there is no need to add semaphores to the kernel. This
- is one reason why master/slave systems are easier to implement.
-
- Application Transparency
- Multiprocessor systems should ideally be transparent to applications,
- meaning that an application can run on single processor or
- multiprocessor systems without any modification. For most applications,
- this will be the case for the HP 890S. The HP-UX system call interfaces
- for the multiprocessor systems are the same as for a single processor
- system. Applications that are structured as a set of cooperating
- processes may experience some problems in a multiprocessor system
- because processes may not execute in the same order as in a single
- processor system. These applications should be tested to identify any
- potential timing problems.
-
- Future Multiprocessing Trends
- Many system vendors will continue to refine their multiprocessing
- offerings. Several vendors such as HP will incorporate advanced
- technology for improved multiprocessor performance.
- HP will continue to enhance its multiprocessing offering. Presently
- HP offers a 4-way multiprocessor system with the HP 890S Corporate
- Business Server. The HP 890S will be increased into an 8-way and then
- later a 16-way multiprocessing system. In order to support these higher
- levels of multiprocessing, HP will break up the HP-UX kernel into
- smaller pieces to allow for increased simultaneous access by many more
- CPUs. Each kernel piece and data structure will continue to be protected
- with semaphores.
- HP-UX will also support multiple threads in the kernel. Threads are
- special light-weight UNIX processes. Traditional UNIX processes (called
- a task) can not be broken down into smaller units. If a UNIX system
- needs to start a new task, it does so with some overhead. With threads,
- a new process can be started with a thread with less overhead (by doing
- things such as sharing the same address space and common memory),
- leading to increased performance. Also, tasks can be broken down into
- several related threads, leading to a higher level of granularity.
- Having this increased granularity will lead to improved multiprocessor
- performance because of parallelization inherent with threads. HP-UX will
- also implement a communication mechanism called ports as a way for
- threads to know about each other, to talk to one another and to
- synchronize related threads. Support for threads will be added first for
- user applications and then for the HP-UX kernel. With these
- enhancements, the HP 890S will continue to offer significant performance
- increases over the next few years.
-
- Massively Parallel Systems
- Much attention has been given to research on massively parallel
- multiprocessor systems. Figure 9 shows that while the hardware
- implementation is straight forward and the potential performance gains
- quite large, the software needed to split up processes, coordinate them
- and maintain data consistency is not straight forward. SMP systems like
- the HP 890S are extensions of single processor systems with a common bus
- to easily share and maintain consistency of a common memory. An SMP
- software environment is very similar, if not identical, to a single
- processor software environment, with the benefit of existing
- applications running without modification. In contrast, applications
- will still have to be written specifically for massively parallel
- systems. Applications such as complex numeric simulations (eg. airflow
- analysis for the space shuttle) that can easily be broken down into
- individual and well defined subprocesses will continue to be the best
- fit for massively parallel systems. For commercial applications, special
- parallel compilers and development tools that would automatically break
- up applications while maintaining data consistency still need to be
- developed. While progress is being made in this area, general purpose
- parallel compilers are still estimated to be years away.
-
- [Figure 9: Illustration: Massively Parallel System]
-
- Distributed Computing
- In many respects, there are similarities between massively parallel
- multiprocessing systems and computer environments made up of multiple
- computers connected by a high-speed network, often called a
- multicomputer. Figure 10 shows that a multicomputer consists of
- distributed systems (with their own private memories) connected
- together, executing processes in parallel, and using messages to
- synchronize and maintain data consistency. In order to provide a
- standards-based framework for multicomputers (or distributed computing),
- the Open Software Foundation (OSF) has developed the Distributed
- Computer Environment (DCE) specification. DCE provides a common
- framework for sending process requests over a network and also a
- distributed file system that can be shared by many computers. With DCE,
- a large application can be written into several parts that can then be
- parsed out to several computers on the network using Remote Procedure
- Calls (RPC). Each computer works in parallel on its portion, with the
- result that the overall application executes much faster than it would
- on one individual system. Multicomputer systems have the advantage of
- using general purpose computers which are typically very cost effective
- and can continue to execute non-distributed applications as well as
- distributed ones. Of course, a SMP system could be one of the systems in
- a multicomputer environment.
-
- [Figure 10: Illustration: Distributed Computing]
-
- Summary
- Multiprocessing systems provide the extra levels of performance needed
- for demanding environments. While there are several types of
- multiprocessing implementations, fully symmetric multiprocessing systems
- offer the most balanced performance for commercial and mixed workload
- environments. SMP systems most fully utilize the extra performance
- available with additional processors. Systems such as the HP 890S
- provide SMP transparently to existing applications that were originally
- written for single processor systems. Standards development groups such
- as OSF will provide operating system technologies that will improve
- multiprocessing performance. While massively parallel systems will
- continue to be advanced, SMP systems with multicomputer networks will
- remain cost effective solutions for meeting general purpose computing
- needs.
-
- Associated files: mpwp01.gal, mpwp01.plt, mpwp02.gal, mpwp02.plt,
- mpwp03.gal, mpwp03.plt, mpwp04.gal, mpwp04.plt, mpwp05.gal,
- mpwp05.plt, mpwp06.gal, mpwp06.plt, mpwp07.gal, mpwp07.plt,
- mpwp08.gal, mpwp08.plt, mpwp09.gal, mpwp09.plt, mpwp10.gal,
- mpwp10.plt,
- Multiprocessing White Paper
-